Data Introduction

Column

Motivation and Background

Working Women: A study on female participation in the labor force around the world
Historically, women around the world tend to face barriers when entering and staying in the workforce. Using data from around the world, I will compare labor rates in different countries and look at potential correlating factors.

Some of the research questions I will explore include:

  • How do female participation rates vary from country to country?

  • What variables in the data set correlate with female participation rates?

  • Do other variables, such as life expectancy and region, relate to each other?

The data used in this analysis is from The World Bank. I used the gender section to find variables related to gender differences in working levels. This dashboard will mainly focus on data from 2020, the most recent complete year of reporting.

Variable Explanations

There were over 60 numerical variables in the original data set, however, I selected a few with the most data to focus on some key indicators.

  • Country: the country of the observation
    -Not all countries are represented, and some have more data than others

  • Year: the year of the observation
    -The numerical variables female & male life expectancy and fertility rate have data for many countries back to 1960.
    -The variables female & male participation rate and female percentage of the labor force have data starting at 1990.

  • Region: the region of the country
    -There are 7 regions

  • Income Level: the income level of the country
    -There are 4 income levels
    -According to the World Bank, “the classifications are updated each year on July 1 and are based on the GNI (Gross National Income) per capita of the previous year.” More about the income classification can be found here.

  • Male Life Expectancy: life expectancy at birth, male (years)

  • Female Life Expectancy: life expectancy at birth, female (years)

  • Fertility Rate: Number of children born per woman on average (births per woman)

  • Female Labor: Female labor force as a proportion of the total labor force (percentage)
    -Shows how active women are in relation to others in the labor force
    -The labor force is made up of people 15 or older that supply labor

  • Female Participation: Rate of women ages 15 or older that supply labor (percentage)

  • Male Participation: Rate of men ages 15 or older that supply labor (percentage)

Analysis

In both the summary statistics and correlation tabs, only data from 2020 will be used.

Summary Statistics

The summary statistics tab shows information about each of the variables in the data set.

The number of countries in each region and income group are shown at the top.

The minimum, mean, maximum and missing value percentage are shown for each of the numerical variables.

  • Female life expectancy is higher on average than male life expectancy

  • The male participation rate tends to be higher than the female participation rate

  • Both the female percentage of labor force and the female participation rate have a large amount of variation in the data


Correlation Plot

The correlation plot shows relationships between the numerical variables in the data set.

  • Male and female life expectancy are the most highly correlated values in the data set. This is likely because of similar living conditions in each country.

  • Female life expectancy and fertility rate are the most strongly negatively correlated values in the data set. This means that women tend to live longer in countries where the average fertility rate is lower.

  • Female participation and female labor are very strongly positively correlated as well. As the percentage of women working rises, the percentage of the workforce that is female tends to rise.

  • Female participation is not strongly correlated with any other numerical values in the data set.

In the next few tabs, I will explore the relationship between female participation and region and income.

Column

Summary Statistics


Categorical Variables

 Region                          Income Group            
 East Asia & Pacific       :37   Low income         :28  
 Europe & Central Asia     :58   Lower middle income:54  
 Latin America & Caribbean :42   Upper middle income:54  
 Middle East & North Africa:21   High income        :80  
 North America             : 3   NA's               : 1  
 South Asia                : 8                           
 Sub-Saharan Africa        :48                           

Numerical Variables

Variable Min Mean Max Missing Values (%)
Male Life Expectancy 51.45 70.57 82.9 8.29
Female Life Expectancy 55.88 75.47 88 8.29
Fertility Rate 0.84 2.57 6.74 7.83
Female Percentage of Labor Force 8.27 41.17 54.91 13.82
Male Participation Rate 44.24 69.2 95.44 13.82
Female Participation Rate 6.08 49.69 83.05 13.82

Correlation

Exploration

Column

Data Table


The table below shows the countries and corresponding variables from 2020.

Worldwide Map

Female Participation

Column

Histogram

Next, I looked at the distribution of female participation. It follows close to a normal distribution, but is skewed slightly to the left.


The mean value for female participation is 49.69%.

The country with the smallest percentage is Yemen, Rep. at 6.08%. Yemen is in the region category of Middle East & North Africa and is classified as low income.

The country with the highest participation is Solomon Islands at a rate of 83.05%. The Solomon Islands are classified as lower middle income and located in the East Asia & Pacific region.

Plot Analysis

Region
Region is displayed as a map by country for the 2020 values (Exploration tab) and the average participation by region over time.

  • On both plots it is shown that the Middle East & North Africa have the lowest participation rates, while Sub-Saharan Africa and North America have the highest rates.

  • The rates in Latin America & Caribbean and the Middle East & North Africa have changed the most in the past 30 years, with both regions seeing an increase between 5-10%.

  • The gap between the Middle East & North Africa and Sub-Saharan Africa is around 30% as of 2020.


Income
The distribution by income shows some interesting results.

  • The median differences in income levels are not as drastic as the differences in region levels.

  • Low income countries have the highest average rate of female participation, followed by high income, upper middle income, and lower middle income.

  • The category lower middle income has the largest spread of data.

  • It is surprising that the two ends of the spectrum have the highest median rates of female participation


Male Participation
Female and male participation do not correlate.

  • In all regions, the average male participation is higher than the average female participation.

  • The average for male participation remains about the same for each region, but the average for female participation varies.

Column

Region

Income

Male Participation

Regional Correlations

Column

Fertility Rate

Income

Female Life Expectancy

Column

Analysis

After looking at several correlations, region became the one with the strongest variation among different variables. This section focuses on some of the regional differences.


Fertility Rate
The fertility rate in Africa is much higher than the rest of the world.

  • Sub-Saharan Africa has an average rate of 4.24 births/woman, almost 2 births higher than any other region in the world.

  • South Korea has the lowest fertility rate with an average of 0.84 births/woman. Niger has the highest at 6.74.


Income
Income varies widely by region.

  • North America has the highest proportion of high income countries (all), followed by Europe & Central Asia at about 66%.

  • Half of the countries in Sub-Saharan Africa classify as low income, and South Asia has the next highest proportion of low income (12%).

  • The regions East Asia & Pacific, Middle East & North Africa, and Sub-Saharan Africa have at least one country per income level.


Female Life Expectancy
Female life expectancy has increased over time in all countries.

  • As of 2020, North America has the highest female life expectancy at 83.4 years, and Sub-Saharan Africa has the lowest with a value of 65.3 years.

  • South Asia’s life expectancy has increased the most, starting at 41.8 years and moving to 73.5 years (+31.7 years).

  • North America has small dips in the average life expectancy because Bermuda only has data for several years from 1960-2000, bringing the average down.

Conclusions

Column

Summary

The rate of women that participate in the workforce varies drastically around the world, and many variables affect the female participation rates in different countries. After studying many of them, I learned that some hold more weight than others. Region, income, and female percentage of the labor force were the three variables that correlated strongest with female participation rates. In addition, region was a strong predictor for many of the variables in the data set, including fertility rate and life expectancy.

The data available placed limitations on this study. There were many more numerical variables I could have used in my analysis, however, many of them had missing values for many countries. I focused my study on the relevant variables with the most data.

I made several assumptions throughout the analysis.

  • The missing values for each variable would not have a dramatic impact on my results. In 2020, there was no data for female participation from 14% of countries around the world. While may of these countries were small, I still was forced to exclude them in my results.

  • When grouping by region over time, not all countries had data going back as far as others. In addition, I did not weight the averages by population, something that could be improved upon in the future.

I learned many things when putting this project together, both about R Markdown and labor participation rates. My biggest takeaway from my analysis is how variables were strongly grouped by region. Countries in the same region were much more likely to have similar characteristics than any other type of indicator.

Column

Future Work

While I narrowed this study down to 8 variables, much work could still be done on the remaining variables. It would be interesting to see the effect education had on female participation rates, as well as more analysis on the difference in male and female participation rates. In some regions the gender gap is shrinking, and an analysis on the reasons behind the closing gap would add context to my presentation.

Another way to study this data could be by country or region. I choose to give a broad overview of the world focusing on the year 2020, but it could be interesting to focus on certain regions.

References

All data used in this project came from The World Bank. This was a great resource for access to real data with applications. In addition to data, the world bank has other resources that I used to help better understand the data after performing some initial analysis.

A resource I found very helpful was iMediaProf - his Youtube video taught me how to embed a Tableau view into my markdown file.

A big thank you to Dr. Chen for all her teaching and guidance!

---
title: "Working Women"
author: "Rachel Sebastian"
output: 
  flexdashboard::flex_dashboard:
    orientation: columns
    vertical_layout: fill
    source_code: embed
    theme:
      bootswatch: zephyr
---

```{r setup, include=FALSE}
library(flexdashboard)
```

```{r imports}
setwd("C:/Users/clari/Documents/School/Classes/MTH 209/final project")

library(pacman)

p_load(tidyverse, ggplot2, RColorBrewer, DataExplorer, vtable, scales, plotly)

#reading in files
gender <- read_csv("data/gender.csv", skip = 4)
colnames(gender) <- mapply(gsub, 'X', '', colnames(gender), USE.NAMES = FALSE)

gender <- gender %>% rename(country_code = "Country Code", country_name = "Country Name", ind_code = "Indicator Code", ind_name = "Indicator Name")

region_income <- read_csv("data/region_income_level.csv") 
region_income <- region_income %>% rename(country_code = "Country Code", region = "Region", income_group = "IncomeGroup") %>% 
  select(country_code, region, income_group)

region_income <- region_income %>% subset(!is.na(country_code)) %>% subset(nchar(country_code) == 3)

#Creating data frame with wanted variables

indicator_names = c("m_life_exp","f_life_exp", "fertility_rate", "female_labor", "male_participation", "female_participation")

df <- gender %>% mutate(indicator = case_when(
  ind_code == "SP.DYN.LE00.MA.IN" ~ indicator_names[1],
  ind_code == "SP.DYN.LE00.FE.IN" ~ indicator_names[2],
  ind_code == "SP.DYN.TFRT.IN" ~ indicator_names[3],
  ind_code == "SL.TLF.TOTL.FE.ZS" ~ indicator_names[4],
  ind_code == "SL.TLF.CACT.MA.ZS" ~ indicator_names[5],
  ind_code == "SL.TLF.CACT.FE.ZS" ~ indicator_names[6]
  
))

df <- subset(df, !is.na(indicator))
df <- df %>% select(-c(ind_name, ind_code)) %>% select("indicator", "country_name", "country_code", everything())

df <- data.frame(country = rep(unique(df$country_name), 62),
                  country_code = rep(unique(df$country_code), 62),
                  year = rep(1960:2021, each = length(unique(df$country_name))),
                  m_life_exp = unname(unlist(as.vector(df[df$indicator==indicator_names[1], 4:65]))),
                  f_life_exp = unname(unlist(as.vector(df[df$indicator==indicator_names[2], 4:65]))),
                  fertility_rate = unname(unlist(as.vector(df[df$indicator==indicator_names[3], 4:65]))),
                  female_labor = unname(unlist(as.vector(df[df$indicator==indicator_names[4], 4:65]))),
                  male_participation = unname(unlist(as.vector(df[df$indicator==indicator_names[5], 4:65]))),
                  female_participation = unname(unlist(as.vector(df[df$indicator==indicator_names[6], 4:65])))
)

df <- df %>% left_join(region_income, by = "country_code") %>% 
  select(country, year, country_code, region, income_group, everything())

df <- df %>% mutate_if(is.character, as.factor)

df$income_group <- factor(df$income_group, levels = c("Low income", "Lower middle income", "Upper middle income", "High income"))

#Taking out data that was not grouped by individual country
data_2020 <- df %>% subset(year == "2020") %>% select(-c("year")) %>% subset(!is.na(region))

#Averages by region and year
region_groups <- df %>% group_by(year, region) %>% summarise(avg_female_labor = mean(female_labor, na.rm = T), avg_female_le = mean(f_life_exp, na.rm = T), avg_female_participation = mean(female_participation, na.rm = T), avg_male_participation = mean(male_participation, na.rm = T))

```

Data Introduction
=======================================================================

Column {.tabset data-width=600 .tabset-fade}
-----------------------------------------------------------------------

### Motivation and Background

<font size="5"> **Working Women: A study on female participation in the labor force around the world**</font>  
Historically, women around the world tend to face barriers when entering and staying in the workforce. Using data from around the world, I will compare labor rates in different countries and look at potential correlating factors. 

Some of the research questions I will explore include:

- How do female participation rates vary from country to country?

- What variables in the data set correlate with female participation rates?

- Do other variables, such as life expectancy and region, relate to each other?

The data used in this analysis is from [The World Bank](https://genderdata.worldbank.org/). I used the gender section to find variables related to gender differences in working levels. This dashboard will mainly focus on data from 2020, the most recent complete year of reporting.


### Variable Explanations

There were over 60 numerical variables in the original data set, however, I selected a few with the most data to focus on some key indicators.

- **Country**: the country of the observation  
  -Not all countries are represented, and some have more data than others  

- **Year**: the year of the observation  
  -The numerical variables female & male life expectancy and fertility rate have data for many countries back to 1960.  
  -The variables female & male participation rate and female percentage of the labor force have data starting at 1990.  

- **Region**: the region of the country  
  -There are 7 regions

- **Income Level**: the income level of the country  
  -There are 4 income levels  
  -According to the World Bank, "the classifications are updated each year on July 1 and are based on the GNI (Gross National Income) per capita of the previous year." More about the income classification can be found [here](https://blogs.worldbank.org/opendata/new-world-bank-country-classifications-income-level-2022-2023#).  
   
- **Male Life Expectancy**: life expectancy at birth, male (years)

- **Female Life Expectancy**: life expectancy at birth, female (years)

- **Fertility Rate**: Number of children born per woman on average (births per woman)

- **Female Labor**: Female labor force as a proportion of the total labor force (percentage)  
  -Shows how active women are in relation to others in the labor force  
  -The labor force is made up of people 15 or older that supply labor
  
- **Female Participation**: Rate of women ages 15 or older that supply labor (percentage)

- **Male Participation**: Rate of men ages 15 or older that supply labor (percentage)

### Analysis

In both the summary statistics and correlation tabs, only data from 2020 will be used. 

**Summary Statistics**

The summary statistics tab shows information about each of the variables in the data set.

The number of countries in each region and income group are shown at the top. 

The minimum, mean, maximum and missing value percentage are shown for each of the numerical variables.   

- Female life expectancy is higher on average than male life expectancy  

- The male participation rate tends to be higher than the female participation rate  

- Both the female percentage of labor force and the female participation rate have a large amount of variation in the data 

----------------------------------------------------------------

**Correlation Plot**

The correlation plot shows relationships between the numerical variables in the data set.  

- Male and female life expectancy are the most highly correlated values in the data set. This is likely because of similar living conditions in each country.

- Female life expectancy and fertility rate are the most strongly negatively correlated values in the data set. This means that women tend to live longer in countries where the average fertility rate is lower.

- Female participation and female labor are very strongly positively correlated as well. As the percentage of women working rises, the percentage of the workforce that is female tends to rise.

- Female participation is not strongly correlated with any other numerical values in the data set. 

In the next few tabs, I will explore the relationship between female participation and region and income.  

Column {.tabset data-width=400 .tabset-fade}
-----------------------------------------------------------------------

### Summary Statistics
<br>
<span style="color: light grey;">Categorical Variables</span>

``` {r summary_cat} 
region_income_table <- summary(data_2020 %>% select(region, income_group))
colnames(region_income_table) <- c("Region", "Income Group")
region_income_table
```

<span style="color: light grey;">Numerical Variables</span>

``` {r summary_num}
labs <- c('Male Life Expectancy',
          'Female Life Expectancy',
          'Fertility Rate',
          'Female Percentage of Labor Force',
          'Male Participation Rate',
          'Female Participation Rate')

st(data_2020 %>% select(-c("region", "income_group", "country", "country_code")),
         summ=c('min(x)',
                'mean(x)',
                'max(x)',
                'propNA(x)*100'),
         summ.names = c('Min',
                        'Mean',
                        'Max',
                        'Missing Values (%)'),
         title = "",
         digits = 2,
         labels = labs)
```

### Correlation
``` {r correlation}
corr <- data_2020 %>% select(-c("region", "income_group", "country", "country_code"))

plot_correlation(corr, cor_args = list("use" = "complete.obs"))
```

Exploration
=======================================================================

Column {.tabset .tabset-fade}
----------------------------------------------------------------------
### Data Table

<br>
The table below shows the countries and corresponding variables from 2020.
<br>

``` {r view}
DT::datatable(df %>% filter(year == "2020", !is.na(region))) %>%
    DT::formatRound(columns=c("female_labor", "male_participation", "female_participation"), digits=3)
```

### Worldwide Map

<div class='tableauPlaceholder' id='viz1669924235929' style='position: relative'><noscript><a href='#'><img alt='Female Participation ' src='https:&#47;&#47;public.tableau.com&#47;static&#47;images&#47;Fe&#47;FemaleParticipationWorldwide&#47;FemaleParticipation&#47;1_rss.png' style='border: none' /></a></noscript><object class='tableauViz'  style='display:none;'><param name='host_url' value='https%3A%2F%2Fpublic.tableau.com%2F' /> <param name='embed_code_version' value='3' /> <param name='site_root' value='' /><param name='name' value='FemaleParticipationWorldwide&#47;FemaleParticipation' /><param name='tabs' value='no' /><param name='toolbar' value='yes' /><param name='static_image' value='https:&#47;&#47;public.tableau.com&#47;static&#47;images&#47;Fe&#47;FemaleParticipationWorldwide&#47;FemaleParticipation&#47;1.png' /> <param name='animate_transition' value='yes' /><param name='display_static_image' value='yes' /><param name='display_spinner' value='yes' /><param name='display_overlay' value='yes' /><param name='display_count' value='yes' /><param name='language' value='en-US' /><param name='filter' value='publish=yes' /></object></div>                

``` {js, embedcode}
var divElement = document.getElementById('viz1669924235929');                    
var vizElement = divElement.getElementsByTagName('object')[0];                    vizElement.style.width='100%';vizElement.style.height=(divElement.offsetWidth*0.4)+'px';                    
var scriptElement = document.createElement('script');                    
scriptElement.src = 'https://public.tableau.com/javascripts/api/viz_v1.js';                    
vizElement.parentNode.insertBefore(scriptElement, vizElement);
```


Female Participation
=======================================================================

Column {.tabset data-width=550 .tabset-fade}
----------------------------------------------------------------------

### Histogram

Next, I looked at the distribution of female participation. It follows close to a normal distribution, but is skewed slightly to the left.

``` {r f_hist}
ggplot(data_2020, aes(x= female_participation)) + geom_histogram(na.rm=T, binwidth = 5, col = "white", fill = "#1b2085") + labs(x = "Female Participation (%)", y = "Number of Countries", title = "Distribution of Female Participation in the Labor Force")
```
<br>

The mean value for female participation is `r round(mean(data_2020$female_participation, na.rm = T),2)`%. 

The country with the smallest percentage is `r data_2020[which.min(data_2020$female_participation), "country"]` at `r round(data_2020[which.min(data_2020$female_participation), "female_participation"],2)`%. Yemen is in the region category of `r data_2020[which.min(data_2020$female_participation), "region"]` and is classified as `r tolower(data_2020[which.min(data_2020$female_participation), "income_group"])`.

The country with the highest participation is `r data_2020[which.max(data_2020$female_participation), "country"]` at a rate of `r round(data_2020[which.max(data_2020$female_participation), "female_participation"],2)`%. The `r data_2020[which.max(data_2020$female_participation), "country"]` are classified as `r tolower(data_2020[which.max(data_2020$female_participation), "income_group"])` and located in the `r data_2020[which.max(data_2020$female_participation), "region"]` region.

### Plot Analysis

**Region**  
Region is displayed as a map by country for the 2020 values (Exploration tab) and the average participation by region over time.

- On both plots it is shown that the Middle East & North Africa have the lowest participation rates, while Sub-Saharan Africa and North America have the highest rates.

- The rates in Latin America & Caribbean and the Middle East & North Africa have changed the most in the past 30 years, with both regions seeing an increase between 5-10%.

- The gap between the Middle East & North Africa and Sub-Saharan Africa is around 30% as of 2020.

---------------------------------

**Income**  
The distribution by income shows some interesting results.

- The median differences in income levels are not as drastic as the differences in region levels.

- Low income countries have the highest average rate of female participation, followed by high income, upper middle income, and lower middle income.

- The category lower middle income has the largest spread of data.

- It is surprising that the two ends of the spectrum have the highest median rates of female participation 

---------------------------------------

**Male Participation**  
Female and male participation do not correlate.  

- In all regions, the average male participation is higher than the average female participation.

- The average for male participation remains about the same for each region, but the average for female participation varies.


Column {.tabset data-width=450 .tabset-fade}
--------------------------------------------------------------------

### Region

``` {r region_1}
ggplot(region_groups, aes(x = year, y = avg_female_participation, groups = region, col = region)) + geom_line(na.rm = T, linewidth = 1.2) + xlim(1988, 2022) + scale_color_brewer(palette = "Set2", na.translate = FALSE) + theme(legend.position="bottom") + guides(colour = guide_legend(title.position = "top")) + labs(title = "Average Female Participation by Region from 1990-2020", x = "Year", y = "Average Female Participation (%)", col = "Region") + theme(text = element_text(size=10))
```

### Income

``` {r income}
p1 <- ggplot(data_2020, aes(x=income_group, y = female_participation)) + geom_boxplot(na.rm = TRUE, fill = "#4F8073") +
  scale_x_discrete(na.translate = FALSE) + labs(x = "Income Level", y = "Female Participation (%)", title = "Female Participation Distribution by Income") + theme(text = element_text(size=13))

ggplotly(p1)
```

### Male Participation
``` {r f_perc}
participation <- region_groups %>% filter(year == 2020) %>% select(year, region, avg_female_participation, avg_male_participation) %>% 
  rename(Female = avg_female_participation, Male = avg_male_participation)

participation_long <- gather(participation, gender, participation, Female:Male)

participation_plot <- ggplot(participation_long, aes(x = region, y = participation, fill = gender, text = paste0("Region: ", region, "\nGender: ", gender, "\nParticipation: ", round(participation, 2), "%"))) + geom_bar(stat = "identity", position = "dodge", na.rm = T) + scale_x_discrete(na.translate = FALSE, labels = label_wrap(12)) + labs(x = "Region", y = "Average Participation Rate (%)", title = "Participation Rates by Region and Gender", fill = "Gender") + scale_fill_manual(values = c("#008395", "#95B0B6"))  + theme(text = element_text(size=13))

ggplotly(participation_plot, tooltip = "text")
```

Regional Correlations
=======================================================================

Column {.tabset data-width=500 .tabset-fade}
-----------------------------------------------------------------------

### Fertility Rate

``` {r r_map}
library(maps)

map <- map_data("world")

#Recoding names to match data set
map$region <- map$region %>% recode("USA" = "United States",
                                    "Venezuela" = "Venezuela, RB",
                                    "Egypt" = "Egypt, Arab Rep.",
                                    "Iran" = "Iran, Islamic Rep.",
                                    "North Korea" = "Korea, Dem. People's Rep.",
                                    "South Korea" = "Korea, Rep.",
                                    "Turkey" = "Turkiye",
                                    "Yemen" = "Yemen, Rep.",
                                    "Laos" = "Lao PDR",
                                    "Russia" = "Russian Federation",
                                    "Syria" = "Syrian Arab Republic",
                                    "Democratic Republic of the Congo" = "Congo, Dem. Rep.",
                                    "Republic of Congo" = "Congo, Rep.",
                                    "French Guiana" = "Guyana",
                                    "Kyrgyzstan" = "Kyrgyz Republic",
                                    "Ivory Coast" = "Cote d'Ivoire",
                                    "Virgin Islands" = "Virgin Islands (U.S.)",
                                    "Saint Vincent" = "St. Vincent and the Grenadines",
                                    "Trinidad" = "Trinidad and Tobago",
                                    "Sint Maarten" = "Sint Maarten (Dutch part)",
                                    "Slovakia" = "Slovak Republic",
                                    "Gambia" = "Gambia, The",
                                    "UK" = "United Kingdom",
                                    "Saint Martin" = "St. Martin (French part)",
                                    "Saint Lucia" = "St. Lucia",
                                    "Antigua" = "Antigua and Barbuda",
                                    "Bahamas" = "Bahamas, The"
                                    )

gender_map <- data_2020 %>% left_join(map, by = c("country"="region"))

```

``` {r fertility_map, fig.height = 7, fig.width = 12}
f <- ggplot(gender_map, aes(long, lat)) + 
  geom_polygon(aes(group = group, fill = fertility_rate, text = paste0(country, ": ", round(fertility_rate, 2), " births/woman"))) +
  scale_fill_viridis_c(option = "D") + labs(fill = "Fertility Rate") + 
  coord_map() + theme_minimal() +
  theme(axis.title.x = element_blank(), axis.text.x = element_blank(), axis.ticks.x = element_blank(), axis.title.y = element_blank(), axis.text.y = element_blank(), axis.ticks.y = element_blank(), panel.grid.major = element_blank(), panel.background = element_blank(), legend.position = "none")

ggplotly(f, tooltip = "text")
```

###  Income

``` {r income_region}
income_plot <- ggplot(data_2020, aes(x = region, fill = income_group)) + 
  geom_bar(position = "fill", na.rm = TRUE) + scale_x_discrete(na.translate = FALSE, labels = label_wrap(12)) + scale_y_continuous(breaks = seq(0,1,by = .2), labels = percent) + scale_fill_manual(values = c("#472d30", "#723d46", "#ad2a56", "#ba8466", "#03071e")) + theme(legend.position="top") + labs(title = "Income Levels by Region", x = "Region", y = "Percentage", fill = "") + theme(text = element_text(size=10))

income_plot
```

### Female Life Expectancy

``` {r f_le_region}
ggplot(region_groups, aes(x = year, y = avg_female_le, groups = region, col = region)) + geom_line(na.rm = T, linewidth = 1.2) + scale_color_brewer(palette = "Set2", na.translate = FALSE) + 
  theme(legend.position="bottom") + guides(colour = guide_legend(title.position = "top")) + labs(title = "Female Life Expectancy by Region from 1960-2020", x = "Year", y = "Average Female Life Expectancy (years)", col = "Region") + theme(text = element_text(size=10))
```

Column {data-width=500}
-----------------------------------------------------------------------

### Analysis

After looking at several correlations, region became the one with the strongest variation among different variables. This section focuses on some of the regional differences.

-----------

**Fertility Rate**  
The fertility rate in Africa is much higher than the rest of the world.

- Sub-Saharan Africa has an average rate of 4.24 births/woman, almost 2 births higher than any other region in the world.

- South Korea has the lowest fertility rate with an average of 0.84 births/woman. Niger has the highest at 6.74. 

------------

**Income**  
Income varies widely by region.

- North America has the highest proportion of high income countries (all), followed by Europe & Central Asia at about 66%.

- Half of the countries in Sub-Saharan Africa classify as low income, and South Asia has the next highest proportion of low income (12%).

- The regions East Asia & Pacific, Middle East & North Africa, and Sub-Saharan Africa have at least one country per income level.

-------------

**Female Life Expectancy**  
Female life expectancy has increased over time in all countries.

- As of 2020, North America has the highest female life expectancy at 83.4 years, and Sub-Saharan Africa has the lowest with a value of 65.3 years.

- South Asia's life expectancy has increased the most, starting at 41.8 years and moving to 73.5 years (+31.7 years).

- North America has small dips in the average life expectancy because Bermuda only has data for several years from 1960-2000, bringing the average down.

Conclusions
=======================================================================

Column {data-width=500}
-----------------------------------------------------------------------
###  Summary

The rate of women that participate in the workforce varies drastically around the world, and many variables affect the female participation rates in different countries. After studying many of them, I learned that some hold more weight than others. Region, income, and female percentage of the labor force were the three variables that correlated strongest with female participation rates. In addition, region was a strong predictor for many of the variables in the data set, including fertility rate and life expectancy.

The data available placed limitations on this study. There were many more numerical variables I could have used in my analysis, however, many of them had missing values for many countries. I focused my study on the relevant variables with the most data. 

I made several assumptions throughout the analysis.

-  The missing values for each variable would not have a dramatic impact on my results. In 2020, there was no data for female participation from 14% of countries around the world. While may of these countries were small, I still was forced to exclude them in my results.

- When grouping by region over time, not all countries had data going back as far as others. In addition, I did not weight the averages by population, something that could be improved upon in the future. 

I learned many things when putting this project together, both about R Markdown and labor participation rates. My biggest takeaway from my analysis is how variables were strongly grouped by region. Countries in the same region were much more likely to have similar characteristics than any other type of indicator.


Column {data-width=500}
-----------------------------------------------------------------------

### Future Work

While I narrowed this study down to 8 variables, much work could still be done on the remaining variables. It would be interesting to see the effect education had on female participation rates, as well as more analysis on the difference in male and female participation rates. In some regions the gender gap is shrinking, and an analysis on the reasons behind the closing gap would add context to my presentation. 

Another way to study this data could be by country or region. I choose to give a broad overview of the world focusing on the year 2020, but it could be interesting to focus on certain regions.

###  References

All data used in this project came from [The World Bank](https://genderdata.worldbank.org/). This was a great resource for access to real data with applications. In addition to data, the world bank has other [resources](https://genderdata.worldbank.org/data-stories/flfp-data-story/) that I used to help better understand the data after performing some initial analysis.

A resource I found very helpful was [iMediaProf](https://www.youtube.com/watch?v=yBIfRS56gjo) - his Youtube video taught me how to embed a Tableau view into my markdown file.

A big thank you to Dr. Chen for all her teaching and guidance!